Predictive Models for Consistency Index of a Data Object in a Replicated Distributed Database System
نویسندگان
چکیده
Consistency is a qualitative measure of database performance. Consistency Index (CI) is a quantification of consistency of a data unit in terms of percentage of correct reads to the total reads observed on the data unit in given time. The consistency guarantee of a replicated database logically depends on the number of reads, updates, number of replicas, and workload distribution over time. The objective of our work is to establish this dependency and finding their level of interactions with consistency guarantees to develop a predictor model for CI. We have implemented Transaction Processing Council-C (TPCC) online transactions benchmark on Amazon SimpleDB which is used as big-data storage. We have controlled the database design parameters and measured CI with 100 samples of workload and database design. The findings helped us to implement a prototype of CI based consistency predictor using statistical predictive techniques like a) Regression model and b) Multiple Perceptron neural network model c) Hidden Markov model. The data statistics show that the neural network based CI predictor causes less error and results in better coefficient of determination R and mean square error (MSE).The Hidden Markov model based CI predictor is capable of modelling the effect of sequence of the input workload on the probability of obtaining a desired CI. Key-Words: Consistency Index (CI), predictive models, regression, neural network, Hidden Markov model
منابع مشابه
Issues in Replicated data for Distributed Real-Time Database Systems
In both Distributed and Real Time Databases Systems replication are interesting areas for the new researchers. In this paper, we provide an overview to compare replication techniques available for these database systems. Data consistency and scalability are the issues that are considered in this paper. Those issues are maintaining consistency between the actual state of the real-time object of ...
متن کاملSeparating indexes from data: a distributed scheme for secure database outsourcing
Database outsourcing is an idea to eliminate the burden of database management from organizations. Since data is a critical asset of organizations, preserving its privacy from outside adversary and untrusted server should be warranted. In this paper, we present a distributed scheme based on storing shares of data on different servers and separating indexes from data on a distinct server. Shamir...
متن کاملEnhancing the Availability of Networked Database Services by Replication and Consistency Maintenance
We describe an operational middleware platform for maintaining the consistency of replicated data objects, called COPla (Common Object Platform). It supports both eager and lazy update propagation for replicated data in networked relational databases. The purpose of replication is to enhance the availability of data objects and services in distributed database networks. Orthogonal to recovery s...
متن کاملPerformance Modeling of Distributed and Replicated Databases
This paper surveys performance models for distributed and replicated database systems. Over the last 20 years a variety of such performance models have been developed and they differ in (1) which aspects of a real system are or are not captured in the model (e.g. replication, communication, non-uniform data access, etc.) and (2) how these aspects are modeled. We classify the different alternati...
متن کاملThe “Virtual-Primary-Copy Approach” Compared To Other Approaches With Weak Consistent Data Replication
Replication is used in distributed systems in order to increase availability and performance. Unfortunately, consistency requirements usually reduce the potential benefits of data replication since updates have to be propagated synchronously to all copies of a replicated data item. Traditionally, consistency requirements in distributed databases are very strong, which is a result of distributio...
متن کامل